Large numbers of child nodes

ModeShape 3 has been designed to efficiently handle a single node having a very large number (>100K) of child nodes. It does this by segmenting the parent's list of child references into multiple blocks, where each block is small enough to deal with.

ModeShape actually performs this optimization in the background rather than do it during the Session's save() operation. As a consequence, the actual number of child references stored in any block might vary significantly from the "optimal" value. And while ModeShape is capable of transparently handling any size blocks, performance when dealing with very large numbers of child nodes will be improve when the block sizes are optimized.

The segmenting function is not enabled by default for a repository, meaning that ModeShape will store all children nodes under the same parent. To enable it, you need to add it to the repository configuration:

JSON

"storage" : {
  "documentOptimization" : 
      "childCountTarget" : 1000,
      "childCountTolerance" : 10,
      "threadPool" : "modeshape-opt",
      "initialTime" : "00:00",
      "intervalInHours" : 24
   }
}

or:

EAP Configuration

            
   <repository name="sample" 
      document-optimization-child-count-target="1000" 
      document-optimization-child-count-tolerance="10"
      document-optimization-initial-time="02:00"
      document-optimization-interval="24"
      document-optimization-thread-pool="modeshape-opt"
/>

where the first 2 attributes control the desired number of children per segment and the variance tolerance, while the last 3 control the details of the thread-pool that spawns the actual threads performing the optimization process.

ModeShape actually performs really well while using a single block for storing child references, even for moderate numbers of children (~10K).

Accessing by path

Navigating to a node by using its path is perhaps one of the most common access patterns in JCR. This uses the 'Node.getNode(String)' method that takes a relative path, and essentially boils down to finding a particular child node with the supplied name and same-name-sibling index. ModeShape internally indexes the children in each block by both name, so finding nodes by name (and SNS) are as fast as possible, even if multiple blocks need to be accessed.

Iterating

Another common access pattern is to iterate over some or all of a parent node's children, using the 'Node.getNodes()' and 'Node.getNodes(String)' methods. The resulting NodeIterator will transparently access the children in one block at a time, and will continue with all blocks until the last child reference is found or until the caller halts the iteration.

Accessing by identifier

Another common access pattern is to find a node by identifier, using the 'Session.getNodeByIdentifier(String)' method. ModeShape handles this request by directly finding the node by its identifier, and only needs to access the parent's (or ancestors') child references only when the node's name or path is requested by the caller (via the 'Node.getName()' or 'Node.getPath()' methods).

Additional performance considerations

See http://modeshape.wordpress.com/2014/08/14/improving-performance-with-large-numbers-of-child-nodes